<html>
<head>
    <title>truSPAdes 1.0 Manual (DEPRECATED)</title>
    <style type="text/css">
        .code {
            background-color: lightgray;
        }
    </style>
</head>
<body>
<h1>TruSPAdes 1.0 Manual</h1>

<h2>DEPRECATED SINCE SPADES 3.15</h2>

1. <a href="#sec1">About truSPAdes</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;1.1. <a href="#sec1.1">Illumina TruSeq data</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;1.2. <a href="#sec1.2">TruSPAdes input</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;1.3. <a href="#sec1.3">TruSPAdes performance</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;1.4. <a href="#sec1.4">Installation</a><br>
2. <a href="#sec2">TruSPAdes input format</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;1.2.1. <a href="#sec2.1">Dataset file format</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;1.2.2. <a href="#sec2.2">Read files naming for automatic dataset file generation</a><br>
3. <a href="#sec3">Running truSPAdes</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;3.1. <a href="#sec3.1">TruSPAdes command line options</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;3.2. <a href="#sec3.2">TruSPAdes output</a><br>
4. <a href="#sec4">Citation</a><br>
5. <a href="#sec5">Feedback and bug reports</a><br>
<br>

<a name="sec1"></a>
<h2>1. About truSPAdes</h2>
<p>
    TruSPAdes is an assembler for short reads produced by Illumina TruSeq Long Read technology.
    TruSPAdes accepts as an input a collection of demultiplexed TruSeq reads and assembles long virtual reads.
    This manual will help you to use truSPAdes.</br>

<a name="sec1.1"></a>
<h3>1.1 Illumina TruSeq data</h3>
    TruSeq Synthetic Long Reads technology is based on fragmenting genomic DNA into large segments (about 10Kb long) and forming random pools of the resulting segments (each pool contains about 300 segments).
    Next, these fragments are clonally amplified, sheared, and marked with a unique barcode.
    Afterwards, they are sequenced using the standard Illumina short reads technology.
    All short reads originating from the same barcode are assembled together resulting in a set of long contigs (this step is called TruSeq barcode assembly).
    Ideally, the result of such sequencing effort for a single barcode is the collection of 300 fragments (each fragment is about 10kb long) from a genome forming 300 long virtual reads.
    Together, these segments are expected to cover about 3 million nucleotides (barcode span).
    TruSPAdes is a tool for barcode assembly.
    TruSPAdes assembles each barcode separately and outputs assembled contigs as Virtual TruSeq Long Reads (TSLRs).

<a name="sec1.2"></a>
<h3>1.2 TruSPAdes input</h3>
    TruSPAdes requires demultiplexed sequencing data as an input.
    Each barcode is assembled from one or several (in case multiple lanes are available) libraries of paired-end reads in fastq format.
    Also note that truSPAdes can handle both compressed and decompressed reads.
    TruSPAdes uses <a href="#sec1.2.1">dataset file format</a> for TruSeq dataset description.
    Dataset file can be either created manually or generated automatically. 
    Since TruSeq reads are provided as a service by Illumina, most TruSeq read files have <a href="#sec1.2.2">particular naming patterns</a> that we use for autogeneration of dataset file.
    See details in <a href="#sec2">section 2</a>.

<a name="sec1.3"></a>
<h3>1.3 TruSPAdes performance</h3>
    TruSPAdes assembles standard Illumina TruSeq dataset (consisting of 384 barcodes) in 30 hours using 8 threads of Intel Xeon 2.27GHz processor and requires less than 16 Gb RAM (2Gb RAM per thread).

<a name="sec1.4"></a>
<h3>1.4 Installation</h3>
<p>
    TruSPAdes comes as a part of SPAdes genome assembler. See <a href = manual.html#sec2>SPAdes manual</a> for installation instructions.


<a name="sec2"></a>
<h2>2. TruSPAdes input format</h2>
    TruSPAdes uses <a href="#sec1.2.1">dataset file format</a> for TruSeq dataset description.
    Dataset file can be either created manually or generated automatically.
    Below we describe dataset file format and conditions that enable automatic generation of dataset files.
<a name="sec2.1"></a>
<h3>2.1 Dataset file format</h3>
    Dataset file describes a collection of read files.
    Each line in dataset file contains description of reads corresponding to a single barcode.
    Description should start with id of barcode (a string consisting of letters and digits) followed by paths to reads.
    For example, for dataset with three barcodes dataset file should look like this:

<pre class="code">
<code>
    barcodeId1 /FULL_PATH_TO_LEFT_READS1/LEFT_READS_FILE_NAME1.fastq.gz /FULL_PATH_TO_RIGHT_READS1/RIGHT_READS_FILE_NAME1.fastq.gz
    barcodeId2 /FULL_PATH_TO_LEFT_READS2/LEFT_READS_FILE_NAME2.fastq.gz /FULL_PATH_TO_RIGHT_READS2/RIGHT_READS_FILE_NAME2.fastq.gz
    barcodeId3 /FULL_PATH_TO_LEFT_READS3/LEFT_READS_FILE_NAME3.fastq.gz /FULL_PATH_TO_RIGHT_READS3/RIGHT_READS_FILE_NAME3.fastq.gz
</code>
</pre>
    In case several lanes are available for a barcode, reads for each lane should be provided in the same line.
    Reads from the same barcode should be written consequently, e.g., in case two lanes are available for barcode, its description should look like this:
<pre class="code">
<code>
    barcodeId /LEFT_READS_PATH_L1.fastq.gz /RIGHT_READS_PATH_L1.fastq.gz /LEFT_READS_PATH_L2.fastq.gz /RIGHT_READS_PATH_L2.fastq.gz
</code>
</pre>
Dataset file can be provided using <code>--dataset</code> option (see <a href="#sec3">section 3</a>).

<a name="sec2.2"></a>
<h3>2.2 Read files naming for automatic dataset file generation</h3>
    Since TruSeq reads are provided as a service by Illumina, most TruSeq read files have particular naming patterns that we use to automatically generate dataset files.
    If read naming in your dataset reads satisfies conditions described below you can avoid manual creation of dataset file and use <code>--input-dir</code> option instead (see <a href="#sec3">section 3</a>).
    The conditions that enable automatic dataset file generation are as follows:
    <ul>
        <li>All reads should be in one or several directories provided using <code>--input-dir</code> option</li>
        <li>No other files should be present in these directories</li>
        <li>Names of files with left (right) reads should contain "R1" ("R2") as their substring.</li>
        <li>Files with left and right reads of the same library should be in the same directory and their names should differ only by substitution of "R1" to "R2"</li>
        <li>In case several lanes are available, reads from the same lane can be marked with "L&#60;n>" label (i.e. contain it as a substring of read name) where &#60;n> is the lane number. </li>
        <li>Reads from the same barcode but from different lanes can be put into different directories but their names should either coinside or differ only by "L<n>" markings.</li>
        <li>All barcodes should have the same number of paired-end libraries (lanes). This is to check for possibly missing or extra files in input directories.</li>
    </ul>
    For example dataset with two lanes and two barcodes can be put into single directory with the following file naming:
<pre class = "code">
<code>
    dataset-directory:
    &nbsp;&nbsp;&nbsp;&nbsp;reads_L1_R1.fastq
    &nbsp;&nbsp;&nbsp;&nbsp;reads_L1_R2.fastq
    &nbsp;&nbsp;&nbsp;&nbsp;reads_L2_R1.fastq
    &nbsp;&nbsp;&nbsp;&nbsp;reads_L2_R2.fastq
</code> 
</pre>
    Or dataset files can be put into two directories:
<pre class = "code">
<code>
    dataset_directory_L1:
    &nbsp;&nbsp;&nbsp;&nbsp;reads_R1.fastq
    &nbsp;&nbsp;&nbsp;&nbsp;reads_R2.fastq
    dataset_directory_L2:
    &nbsp;&nbsp;&nbsp;&nbsp;reads_R1.fastq
    &nbsp;&nbsp;&nbsp;&nbsp;reads_R2.fastq
</code> 
</pre>    


<a name="sec3"></a>
<h2>3. Running truSPAdes</h2>

<a name="sec3.1"></a>
<h3>3.1 TruSPAdes command line options </h3>
<p>
    To run truSPAdes from the command line, type 

<pre class="code">
<code>
    truspades.py [options] -o &lt;output_dir>
</code>
</pre>
Note that we assume that truSPAdes installation directory is added to the <code>PATH</code> variable (provide full path to truSPAdes executable otherwise: <code>&lt;truspades installation dir>/truspades.py</code>). 

<a name="basicopt"></a>
<h4>Basic options</h4>
<p>
    <code>-o &lt;output_dir> </code><br>
    &nbsp;&nbsp;&nbsp;&nbsp;Specify the output directory. Required option.
</p>

<p>
    <code>-h</code> (or <code>--help</code>)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;Prints help.
</p>

<p>
    <code>-v</code> (or <code>--version</code>)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;Prints version.
</p>

<p>
    <code>--continue</code><br>
    &nbsp;&nbsp;&nbsp;&nbsp;Continues truSPAdes run from the specified output folder.
</p>

<p>
    <code>-t &lt;int></code> (or <code>--threads &lt;int></code>)<br>
    &nbsp;&nbsp;&nbsp;&nbsp;Number of threads. The default value is 8.
</p>

<a name="inputdata"></a>
<h4>Input data</h4>
<p>
    <code>--dataset &lt;file_name> </code><br>
    &nbsp;&nbsp;&nbsp;&nbsp;Dataset file foratted as specified in <a href="#sec2.1">section 2.1</a>.
</p>

<p>
    <code>--input-dir &lt;dir_name> </code><br>
    &nbsp;&nbsp;&nbsp;&nbsp;Directory containing reads. Note that naming of read files should satisfy conditions described in <a href="#sec2.2">section 2.2</a>.
</p>

<a name="sec3.2">
<h3>3.2. TruSPAdes output</h3>
<p>
    TruSPAdes stores all output files in <code>&lt;output_dir> </code>, which is set by the user.
    Resulting TruSeq long reads will be stored in <code>&lt;output_dir>/TSLRs.fastq</code>
    The full list of <code>&lt;output_dir></code> content is presented below: 
</p>
<pre>
    <code>TSLRs.fastq</code> &ndash; <i>resulting truseq long reads</i>
    <code>TSLRs.fasta</code> &ndash; <i>resulting truseq long reads in fasta format</i>
    <code>dataset.info</code> &ndash; <i>dataset file</i>
    <code>truspades.log</code> &ndash; <i>truSPAdes log file</i>
    <code>barcodes</code> &ndash; <i>directory containing output files for separate barcodes</i>
    <code>logs</code> &ndash; <i>directory containing log files for barcode assembly</i>
</pre>

<a name="sec4">
<h2>4. Citation</h2>
<p>
    If you use truSPAdes in your research, please include <a href="http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3737.html" target="_blank">Bankevich & Pevzner, 2016</a> in your reference list.

<a name="sec5">
<h2>5. Feedback and bug reports</h2>
<p>
    Your comments, bug reports, and suggestions are very welcomed. They will help us to further improve truSPAdes.
    
<p>
    If you have any troubles running truSPAdes, please send us <code>dataset.info</code> and <code>truspades.log</code> from the directory <code>&lt;output_dir></code>.

<p>
    Address for communications: <a href="mailto:spades.support@cab.spbu.ru" target="_blank">spades.support@cab.spbu.ru</a>.

<br/><br/><br/><br/><br/>

</body>
</html>
